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Abstract 

Identification of jets originating from beauty and charm quarks is important for 
measuring Standard Model processes and for searching for new physics. The per¬ 
formance of algorithms developed to select b- and c-quark jets is measured using 
data recorded by LHCb from proton-proton collisions at ^/s = 7TeV in 2011 and at 
^/s = 8TeV in 2012. The efficiency for identifying a b{c) jet is about 65%(25%) with 
a probability for misidentifying a light-parton jet of 0.3% for jets with transverse 
momentum px > 20GeV and pseudorapidity 2.2 < rj < 4.2. The dependence of the 
performance on the px and r] of the jet is also measured. 
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1 Introduction 


Identification of jets that originate from the hadronization of beanty (6) and charm (c) 
quarks is important for studying Standard Model (SM) processes and for searching for new 
physics. For example, the ability to efficiently identify b jets with minimal misidentihcation 
of c and light-parton jets is crucial for the measurement of top-quark production. The 
study of ti production in the forward region probes the structure of the proton and can 
be used to search for physics beyond the SM [^. Measuring charge asymmetries in di-6-jet 
production also probes beyond the SM physics [^|^. Furthermore, identification of c jets 
is important for probing the structure of the proton, e.g. in W+c production. 

The signature of a 6 or c jet is the presence of a long-lived 6 or c hadron that carries a 
sizable fraction of the jet energy. The LHCb detector was designed to identify b and c 
hadrons, and so is expected to perform well at identifying, or tagging, b and c jets. This 
paper describes two algorithms for identifying b and c jets, one designed to identify both b 
and c jets offline, and another initially designed to identify 6-hadron decays in the trigger. 
The performance of each algorithm is measured using several subsamples of the 3fb“^ of 
proton-proton collision data collected at ^/s = 7TeV in 2011 and at 8TeV in 2012 by the 
LHCb detector. The distributions of observable quantities used to discriminate between 6, 
c and light-parton jets are compared between data and simulation. 

2 The LHCb detector 

The LHCb detector is a single-arm forward spectrometer covering the pseudorapidity 
range 2 < t] < 5, designed for the study of particles containing 6 or c quarks. The 
detector includes a high-precision tracking system consisting of a silicon-strip vertex 
detector surrounding the pp interaction region [^, a large-area silicon-strip detector 
located upstream of a dipole magnet with a bending power of about 4Tm, and three 
stations of silicon-strip detectors and straw drift tubes placed downstream of the 
magnet. The tracking system provides a measurement of momentum, p, of charged 
particles with a relative uncertainty that varies from 0.5% at low momentum to 1.0% at 
200 GeV (c = 1 throughout this paper). The minimum distance of a track to a primary 
vertex, the impact parameter, is measured with a resolution of (15 -|- 29 /pt) M-ki, where px 
is the component of the momentum transverse to the beam, in GeV. Different types of 
charged hadrons are distinguished using information from two ring-imaging Cherenkov 
detectors. Photons, electrons and hadrons are identihed by a calorimeter system consisting 
of scintillating-pad and preshower detectors, an electromagnetic calorimeter and a hadronic 
calorimeter. The electromagnetic and hadronic calorimeters have an energy resolution of 
a{E)/E = 10%/a/E' © 1% and a{E)/E = Q9%/\fE © 9% (with E in GeV), respectively. 
Muons are identihed by a system composed of alternating layers of iron and multiwire 
proportional chambers |^. 

The trigger consists of a hardware stage, based on information from the calorimeter 
and muon systems, followed by a software stage, which applies a full event reconstruction. 
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This analysis requires that either a high-px niuon or a (6, c)-hadroi0 candidate satishes 
the trigger requirements. Events recorded due to the presence of a high-px muon are 
required to have a muon candidate with px > 10 GeV. Events recorded due to the presence 
of a (6, c)-hadron decay require that at least one track should have px > 1-7 GeV and 
Xip with respect to any primary interaction greater than 16, where Xip is defined as the 
difference in of a given primary pp interaction vertex (PV) reconstructed with and 
without the considered track. Decays of b hadrons are inclusively identified by requiring a 
two-, three- or four-track secondary vertex (SV) with a large sum of px of the tracks and 
a signihcant displacement from the PV. A specialized boosted decision tree (BDT) 
algorithm is used for the identification of SVs consistent with the decay of a 6 hadron 
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This inclusive trigger algorithm is called the topological trigger (TOPO) and is studied as 
a 6-jet tagger in this paper. Decays of long-lived c hadrons are identified either exclusively 
using decay modes with large branching fractions, or in Zl*(2010)^ —)■ D^tt^ decays where 
the is selected inclusively by the presence of a two-track SV. 

In the simulation, pp collisions are generated using Pythia 
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configuration 14 . Decays of hadronic particles are described by EvtGen 15 


with a specific LHGb 
in which 


hnal-state radiation is generated using Photos (^. The interaction of the generated 
particles with the detector, and its response, are implemented using the Geant4 toolkit 17 
as described in Ref. 
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3 Jet identification algorithms 


Jets are clustered using the anti-fcx algorithm with a distance parameter 0.5, as 
implemented in Fastjet 20 . Information from all the detector sub-systems is used 


to create charged and neutral particle inputs to the jet algorithm using a particle flow 
approach |^. During 2011 and 2012, LHCb collected data with a mean number of pp 
collisions per crossing of about 1.7. To reduce contamination from multiple pp interactions, 
charged particles reconstructed within the vertex detector may only be clustered into a jet 
if they are associated to the same PV. The identification of (6, c) jets is performed using 
SVs from the decays of (6, c) hadrons. The choice of using SVs and not single-track or 
other non-SV-based jet properties, e.g. the number of particles in the jet, is driven by the 
need for a small misidentification probability of light-parton jets in the analyses performed 
at LHGb. Furthermore, the properties of SVs from (6, c)-hadron decays are known to be 
well modeled in LHGb simulation. 


3.1 The SV tagger 

The tracks used as inputs to the SV-tagger algorithm are required to have px > 0.5 GeV 
and Xip > 16. The Xip requirement is rarely satisfied by tracks reconstructed from particles 
originating directly from the PV. Hadronic particle identification is not used and, instead, 
all particles are assigned the pion mass. In contrast to many other jet-tagging algorithms, 

^The notation (6, c) is used to mean 6 or c throughout this paper. 
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tracks are not required to have AR = ^Aif + < 0.5, where Ari{A(j)) is the difference 

in pseudorapidity (azimuthal angle) between the track momentum and jet axis, since for 
low pt jets tracks outside of the jet cone help to discriminate between c and b jets. 

All possible two-track SVs are built using pairs of the input tracks such that the 
distance of closest approach between the tracks is less than 0.2 mm, the vertex ht < 10 
and the two-body mass is in the range 0.4 GeV < M < M{B), where M{B) is the nominal 
B^ mass 22 . Since all particles are assigned a pion mass, the upper mass requirement 


rarely removes SVs from any long-lived b hadrons. The lower mass requirement removes 
SVs from most strange-particle decays, including the A baryon whose computed mass is 
always below 0.4 GeV when the proton is assigned a pion mass. At this stage tracks are 
allowed to belong to multiple SVs. Next, all two-track SVs with AR < 0.5 relative to the 
jet axis, where the direction of flight is taken as the PV to SV vector, are collected as 
candidates for a so-called linking procedure. This procedure involves merging SVs that 
share tracks until none of the remaining SVs with AR < 0.5 share tracks. The SV position 
is taken to be the weighted average of the 2-body SV positions using the inverse of the 
2-body vertex values as the weights. 

The linking procedure can produce SVs that contain any number of tracks. The linked 
n-track SVs are required to have pt > 2 GeV, significant spatial separation from the PV, 
and to contain at most one track with AR > 0.5 relative to the jet axis. If the SV has only 
two tracks and a mass consistent with that of the Kg |^, the SV is rejected. Interactions 
with material, and strange-particle decays, are suppressed by requiring that the flight 
distance divided by the momentum of the SV is less than 1.5mm/GeV; this quantity 
serves as a proxy for the hadron lifetime. The SV position is also required to be within a 
restricted region consistent with that of {b, c)-hadron decays. 

An important quantity for discriminating between hadron types is the so-called corrected 
mass defined as 

Mcor = \/-1- p 2 sin2 0 -I- p sin 6, (1) 


where M and p are the invariant mass and momentum of the particles that form the SV 
and 9 is the angle between the momentum and the direction of flight of the SV. The 
corrected mass is the minimum mass that the long-lived hadron can have that is consistent 
with the direction of flight. The linked n-track SVs are required to have M^or > 0.6 GeV 
to remove any remaining kaon or hyperon decays. A few percent of jets contain multiple 
SVs that pass all requirements; in such cases the SV with the highest px is chosen. The 
fraction of multi-SV-tagged jets is consistent in data and simulation. 

Two BDTs are used to identify b and c jets: BD"I{bc\udsg) trained to separate (6, c) 
jets from light-parton jets and BDT(6|c) trained to separate b jets from c jets. Both BDTs 
are trained on simulated samples of b, c and light-parton jets. The inputs to both BDTs 
are as follows: 


• the SV mass M; 

• the SV corrected mass McoP, 

• the transverse flight distance of the two-track SV closest to the PV; 
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• the fraction of the jet pt carried by the SV, PT(SV)/pT(jet); 

• AR between the SV flight direction and the jet; 

• the number of tracks in the SV; 

• the number of SV tracks with AR < 0.5 relative to the jet axis; 

• the net charge of the tracks that form the SV; 

• the flight distance 

• the sum of all SV track xjp- 

For jets that contain an SV passing all of the requirements, the two BDT responses are 
used to identify the jet as either b, c or light-parton. 


3.2 The topological trigger 

The topological trigger algorithm uses SVs that satisfy similar criteria to those used in 
the SV-tagger algorithm to build two-, three- and four-track SVs. The TOPO SVs are 
required to have large px and significant flight distance from the PV. The TOPO provides 
an efficient trigger option for generic 6-jet events, as the SV used by the TOPO to trigger 
recording of the event can also be used to tag a b jet. The BDT used in the TOPO 
algorithm uses the following inputs: 

• the SV mass; 

• the SV corrected mass; 

• the sum of the px of the SV tracks; 

• the maximum distance of closest approach between the SV tracks; 

• the Xip of fhe SV formed using the momentum of the tracks that form the SV and 
SV position; 

• the flight distance of the SV from the PV; 

• the minimum px of the SV tracks. 


To ensure stability during data-taking the TOPO BDT uses discretized inputs as described 
in detail in Ref. (^. Further details about the TOPO algorithm and its performance on 
6-hadron decays as measured in LHCb data can be found in Ref. 
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Figure 1: SV-tagger algorithm BDT(6|c) versus BDT{bc\udsg) distributions obtained from 
simulation for (left) b, (middle) c and (right) light-parton jets. 


3.3 Performance in simulation 

Figure shows the SV-tagger BDT distributions obtained from simulated hF-fjet events 
for each jet type. The distributions in the two-dimensional BDT plane of SV-tagged b, c, 
and light-parton jets are clearly distinguishable. The full two-dimensional distribution 
is htted in data to determine the jet flavor content. However, to aid in comparison to 
other jet-tagging algorithms, a requirement of BDT{bc\udsg) > 0.2 is applied to display 
the performance obtained from simulated events in Fig. This requirement is about 90% 
efficient on SV-tagged (6, c) jets and highly suppresses light-parton jets. The (6, c)-jet 
efficiencies are nearly uniform for jet pt > 20 GeV and for 2.2 < p < 4.2, but are lower for 
low-pT jets and for jets near the edges of the detector. The misidentification probability of 
light-parton jets is less than 0.1% for low-px jets and increases to about 1% at 100 GeV. 
Figure shows the (6, c)-jet efficiencies versus the mistag probability of light-parton jets 
obtained by increasing the BDT{bc\udsg) cut. 

For the TOPO algorithm, in the trigger a BDT requirement is always applied; the 
requirement is looser when the SV contains a muon. In the LHCb measurement of the 
charge asymmetry in bb production |^, this same looser BDT requirement was applied to 
tag a second jet in the event. Figure shows the performance of the TOPO algorithm, 
obtained from simulated events, for both the nominal and loose BDT requirements. The 
nominal trigger BDT requirement strongly suppresses c and light-parton jets, with the 
misidentification probability of light-parton jets being 0.01% for low-pT jets. Such a strong 
suppression is required during online running due to output rate limitations. 

The jet-tagging performance is measured in simulated events with one pp collision and 
two or more pp collisions and found to be consistent. The tagging performance is also 
studied in simulation using different event types, e.g. top-quark and QCD di-jet events, 
with only small changes in the tagging efficiencies and BDT templates observed for (6, c) 
jets. The mistag probability of light-parton jets is found to be higher for high-pT jets in 
events that also contain (6, c) jets. This is discussed in detail in Sec. 
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Figure 2: Efficiencies and mistag probabilities obtained from simulation for the SV-tagger 
and TOPO algorithms for (top) b, (middle) c and (bottom) light-parton jets. The left plots 
show the dependence on px for 2.2 < rj < 4.2, while the right plots show the dependence on 
r] for pt > 20GeV (see text for details). The “loose” label for the TOPO refers to the BDT 
requirement used in the trigger for SVs that contain muon candidates. 


4 Efficiency measurements in data 

The tagging efficiencies for b and c jets are measured in data and compared with expecta¬ 
tions from simulation. To measure the tagging efficiencies in a given data sample, both the 


6 





















































LHCb simulation 



Figure 3: Efficiencies for SV-tagging a (6, c)-jet versus mistag probability for a light-parton jet 
from simulation. The curves are obtained by varying the JiDT{bc\udsg) requirement. 


number of tagged (6, c) jets and the total number of (6, c) jets must be determined. The 
tagged {b, c) yields are obtained by htting the SV-tagger or TOPO BDT distributions in the 
subsample of jets that are tagged by an SV. The total number of {b, c) jets is determined by 
htting the Xip distribution of the highest-pT track in the jet. The (6, c)-tagging efficiency 
is the ratio of the tagged over total (6, c)-jet yields. 

An alternative approach employed by other experiments (see, e.g. Ref. |^) is to 
measure the efficiency using the subsample of jets that contain a muon. This approach has 
the advantage that the {b, c)-jet content is enhanced due to the presence of muons from 
the semileptonic decays of (6, c) hadrons; however, the disadvantage is that this method 
assumes that mismodeling of the tagging performance is the same for semileptonic and 
inclusive decays. Both the highest-px track and muon-jet methods are used in this analysis 
to study the jet-tagging performance. 

Combined hts of several data samples enriched in (6, c) jets are performed to obtain 
the tagging efficiencies. It is important to include the systematic uncertainties on both 
the tagged and total (6, c)-jet yields for each data sample in the combined hts. 

This section is arranged as follows: the data samples used are described in Sec. 4.1; the 
BDT hts used to obtain the tagged (6, c)-jet yields are given in Sec. 4.2; the highest-px-track 
Xip fits used to obtain the total (6, c)-jet yields are described in Sec. 4.3; the muon-jet 
subsample method is discussed in Sec. 4.4; the systematic uncertainties on the tagged and 
total (6, c)-jet yields are presented in Sec. 4.5; and the (6, c)-tagging efficiency results are 
given in Sec. 4.6. 

4.1 Data samples 

Events that contain either a high-px muon or a fully reconstructed {b, c) hadron, referred 
to here as an event-tag, are used to measure the jet-tagging efficiencies in data. The 
highest-px jet in the event that does not have any overlap with the event-tag is chosen as 
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the test jet. Each event-tag is required to have satished specihc trigger requirements and 
to have A0 > 2.5 relative to the test-jet axis to reduce the possibility of contamination 
of the jet from the event-ta^ Therefore, all events used to measure the (6, c)-tagging 
efficiency have passed the trigger independently of the presence of the test jet, which 
ensures that the trigger does not bias the efficiency measurement. The following event-tags 
are used (labeled by the data-set identiher): 

• (5+jet) a fully reconstructed 6-hadron decay which enriches the 6-jet content of the 
test-jet sample; 

• (D-l-jet) a fully reconstructed c-hadron decay which enriches the c-jet and 6-jet 
content of the test-jet sample (due to 6 —>■ c decays); 

• (/i(6, c)+jet) a displaced high-px muon which enriches the c-jet and 6-jet content of 
the test-jet sample; 

• (hh-|-jet) a prompt isolated high-px muon indicative of hh+jet events that consists of 
about 95% light-parton jets. 

The hrst three samples are used to measure the (6, c)-jet identihcation efficiencies and 
properties. The hnal sample is used to study misidentihcation of light-parton jets. In 
all samples the event-tag and test jet are required to originate from the same PV. The 
range 10 < px(jet) < 100 GeV is considered since there are no large enough data samples 
to measure the efficiency for jet px > 100 GeV. 

4.2 Tagged-jet yields 

The presence of an SV and its kinematic properties are used to discriminate between 
6, c and light-parton jets. As described in Section 3, the SV-tagger algorithm uses two 
BDTs while the TOPO uses one BDT for each SV. The tagged yields for each algorithm 
are obtained by htting to data BDT templates obtained from simulation for 6, c and 
light-parton jets. In all hts the template shapes are hxed and only the yields of each jet 
type are free to vary. 

Figures |4]-[^ show the results of hts performed to the two-dimensional SV-tagger BDT 
distributions in the i?-|-jet, D-|-jet and /i(6, c)-|-jet data samples. The 6 and c jets are 
clearly distinguishable in the two-dimensional BDT distributions: 6 jets are mostly found 
in the upper right corner, while c jets are found in the center-right and lower-right regions. 
The light-parton jets cluster near the origin but are difficult to see due to the low SV-tag 
probability of light-parton jets. The BDT templates for 6, c and light-parton jets describe 
the data well. A dedicated study of the modeling of the light-parton-jet BDT distributions 
is discussed in Sec. 5. 

^The event-tag samples are highly pure; however, when the event-tag is not properly reconstructed 
the non-overlap requirements are not guaranteed to hold. Requiring that the event-tag and test jet are 
back-to-back in the transverse plane greatly reduces the probability that a particle originating from the 
event-tag decay but not reconstructed in the event-tag is reconstructed as part of the test jet. 
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^\yT{bc\udsg) 




Figure 4: SV-tagger BDT fit results for the B+jet data sample with 10 < pT(jet) < 100 GeV: 
(top left) distribution in data; (top right) two-dimensional template-fit result; and (bottom) 
projections of the fit result with the 6, c, and light-parton contributions shown as stacked 
histograms. 


A simple cross-check on the b, c and light-parton yields is performed by htting only 
two of the BDT inputs: the corrected mass dehned in Eq. and the number of tracks in 
the SV. The corrected mass provides the best discrimination between c jets and other jet 
types due to the fact that M^or peaks near the D meson mass for c jet^ The number of 
tracks in the SV identihes b jets well since 6-hadron decays often produce many displaced 
tracks. Figure shows the results of a two-dimensional ht to these two SV properties. 
The absolute fractions of b, c and light-parton SV tags agree with the BDT ht results 
to within 1-2%. The corrected mass has been previously used in LHCb jet analyses for 
determining the c-jet yield and for extracting the 6-jet yield 
structure for c jets, which relies on the excellent vertex resolution of the LHCb detector, 
makes them easily identihable. 

Figure]^ shows the results of htting the TOPO BDT distributions in the various data 
samples using 6, c and light-parton jet template shapes obtained from simulation. The 
ratios of SV-tagger to TOPO SV-tagged 6, c and light-parton jets are each consistent with 
expectations from simulation. Modeling of both the SV-tagger and TOPO SV properties 
are sufficient to allow the SV-tagged content to be accurately determined. 

^This is true for all long-lived c hadrons when all tracks are assigned a pion mass. 


25 . The clear peaking 
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Figure 5: Same as Fig. |^for the L)-|-jet data sample. 
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Figure 6: Same as Fig. |^for the /i(6, c)-|-jet data sample. 
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Figure 7: Two-dimensional Mcor versus SV track multiplicity fit results for (top) i?-|-jet, (middle) 
Z?-|-jet and (bottom) /r(6, c)-|-jet data samples. The left plots show the projection onto the Mcor 
axis, while the right plots show the projection onto the track multiplicity. The highest Mcor bin 
includes candidates with Mcor > 10 GeV. 


4.3 Efficiency measurement using highest-px tracks 

To determine the jet-tagging efficiency, the jet composition prior to applying the SV tag 
must be determined. This is necessarily more difficult than determining the SV-tagged 
composition. The Xip distribution of the highest-px track in the jet is used for this task. 
For light-parton jets the highest-px track will mostly originate from the PV, while for 
{b, c) jets the highest-px track will often originate from the decay of the {b, c) hadron. To 
avoid possible issues with modeling of soft radiation, only the subset of jets for which the 
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TOPO BDT 

Figure 8: Fits to the TOPO BDT distribution in (left) B+jet, (middle) D+jet and (right) 
n{b, c)+jet data samples with 10 < PT(jet) < 100 GeV. 


highest-pT track satisfies px(track)/px(jet) > 10%, which is about 95% of all jets, is used. 

Since the PF+jet sample is dominantly composed of light-parton jets, the Xip detector 
response can be obtained in a data-driven way using this data sample. First, the two- 
dimensional SV-tagger BDT response is fitted to determine the SV-tagged b, c and 
light-parton jet yields. The tagging efficiencies obtained in simulation for b and c jets are 
used to estimate the total number of b and c jets in the PF-fjet data sample. Since the 
b and c jets combined make up only 5% of the total data sample, any mismodeling of 
the SV-tagging efficiency will have negligible impact on this study. The IP resolution is 
studied by comparing the observed Xip distributions in data with templates obtained from 
simulation in bins of jet px- The resolution in data is found to be about 10% worse than 
in the simulation which is consistent with previous LHCb studies of the IP resolution [^. 
Figure shows that the data-driven templates describe the data well. The difference in the 
detector response between data and simulation is assumed to be universal and is applied 
to correct the Xip templates for b and c jets. 

Figure 10 shows the results of fitting the Xip distributions in the B-|-jet, D-|-jet and 
/i(&, c)+jet data samples. Each sample consists of mostly light-parton jets prior to applying 
an SV tag. While these data samples require that an event-tag containing a (6, c) quark is 
reconstructed, the associated (b, c) quarks produced in hard scattering processes are often 
not produced within the LHCb acceptance. Furthermore, for the (B,D)-hjet samples, the 
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Figure 9: Results of Xjp calibration using VF+jet data for 10 < PT(jet) < 100 GeV. The tail out 
to large Xip values in the light-parton-jet sample is largely due to strange particle decays. 


event-tags often have low pt so that the associated (6, c) quarks may be within the LHCb 
acceptance but do not form a high-px jet. The light-parton-jet Xip template has a long 
tail out to large values which arises due to hyperon and kaon decays. In the Xip fits, the 
log Xip > 3 component of the light-parton template is allowed to vary independently to 
allow for different s-quark content from the lT-|-jet calibration sample. Apart from this, 
all Xip templates are fixed in shape. The efficiency for tagging a jet originated by a quark 
of type q is determined as 

£, = N,(SV)/N,(xU (2) 

i. e. it is the ratio of the yield determined from hts to the SV-tagged BDT distributions, 
either for the SV-tagger or TOPO algorithm, to the yield obtained from hts to the Xip 
distributions. 


4.4 Efficiency measurement using muon jets 


The approach described in the previous subsection has the advantage that it involves 
measuring the efficiency on almost all of the jets in the data sample; however, its disad¬ 
vantage is the large light-parton-jet content, which results in 10-20% uncertainties on the 
pre-SV-tag jet content. An approach used by other experiments is to measure the efficiency 
on the subset of jets that contain muons. The tagging efficiencies are also obtained using 
Eq. [^for the muon-jet subsamples. Figures 11 13 show the SV-tagger BDT and Xip fit 
results for the muon-jet subsample of each data set. In these subsamples the Xip is that 
of the highest-pT muon in the jet. The muon is required to satisfy PT(h)/PT(jet) > 10%. 
The initial light-parton-jet content is greatly reduced in these data subsamples; however, 
this approach only uses about 10% of the jets and it is possible that mismodeling of the 
jet-tagging performance in semileptonic decays is not the same as for other decays. 
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Figure 10: Fits to the xfp distribution in (top left) -B+jet, (top right) D+jet and (bottom) 
n{b, c)+jet data samples. 


4.5 Systematic uncertainties 

The systematic uncertainty on iV(fc c)(SV) is estimated using the difference between the 
(6, c) SV-tagged yields obtained from two different fits: the fit to the BDT distributions 
and the ht to the Mcor versus track multiplicity distributions. The latter approach removes 
jet quantities such as jet pp from the yield determination. While the absolute uncertainty 
on the SV-tagged qnark content as determined by the difference in these two methods is 
only a few percent, the relative uncertainty is large for cases where a given jet type makes 
up a small fraction of the SV-tagged data sample. For example, the relative uncertainty on 
the c-jet yield in the i?-|-jet data sample is large. As a further cross-check the [B, Zl)-|-jet 
data samples are used to obtain data-driven BDT templates. The difference in {b, c) yields 
obtained by htting the IT-fjet data sample using the data-driven and simulation templates 
is fonnd to be negligible. 

The systematic uncertainty on jV(6,c)(Xip) several components. The nominal Xip 
fits allow the large-IP component of the light-parton-jet template to vary. The Xip fifs 
are repeated hxing this component to that observed in fF+jet data, with the difference in 
(6, c)-jet yields assigned as a systematic uncertainty. This uncertainty is sizable for the case 
of high-pT c jets whose xh template is less distinct from that of light-parton jets which 
has a variable large-IP component in the fit. Possible dependence of the mismodeling of 
the IP resolntion on the origin point of the particle is studied and fonnd to be negligible. 
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muon logCxp 

Figure 11: (top) SV-tagger two dimensional BDT fit results projected onto the (left) 

BDT( 6 c|uds 5 ) and (right) BDT(5|c) axes and (bottom) Xip fit results for the B+muon-jet 
subsample with 10 < PT(jet) < 100 GeV. 


For the case of muon jets, the misidentihcation probability of hadrons as muons and 
the jet track multiplicity must be modeled properly to obtain an accurate Xip distribution. 
Mismodeling of these properties does not lead to large uncertainty on Nb{xip), since 
the vast majority of reconstructed muons in b jets are truly muons that arise due to 
semileptonic decays. For c jets, however, mismodeling of these properties can produce 
sizable shifts in Nc{xip) smaller fraction of c jets that contain muons from 

semileptonic decays. A comparison between PF+jet data and simulation of the jet fraction 
that satisfies the muon-jet requirements, in bins of jet px, is used to obtain an estimate of 
the probability of misidentifying a jet as a muon jet. Based on this study a 5% relative 
uncertainty is assigned to Nb{xip) 20% to Nc{xip) muon jets. Another possible 
way of misidentifying muon jets is if the semileptonic decay of a 6 hadron outside of the 
jet produces a muon reconstructed as part of the jelQ The AR distribution between the 
SV direction of flight and jet axis for all muons found in an SV is used to conclude that 
this effect is at the per mille level; it is taken to be negligible. 

Jets produced in different types of events can have different properties. The 6 -tag 
efficiency is found to agree to about 1% in simulated W+b, top and QCD multi-jet events. 

^This can also happen for semileptonic c-hadron decays; however, such decays rarely produce particles 
with AR > 0.5 to the jet axis due to the much lower mass of c hadrons compared to that of b hadrons. 
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Figure 12: Same as Fig. but for the H+muon-jet data sample. 
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Same as Fig. |ll|but for the /i(6, c)+muon-jet data sample. 
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Table 1: Summary of relative systematic uncertainties (— denotes negligible). Systematic 
uncertainties that dependent on jet type and pT are marked by a * (see text for details). 


source 

b jets 

c jets 

BDT templates* 

^ 2% 

^2% 

light-parton-jet large IP component* 

^ 5% 

^ 10 - 30% 

IP resolution 

— 

— 

hadron-as-muon probability (muon-jet subsample only) 

5% 

20% 

out-of-jet (6, c)-hadron decay 

— 

— 

gluon splitting 

1% 

1% 

number of pp interactions per event 

— 

— 


The BDT shapes are studied in simulated single-jet b and di-jet bb events and found to be 
consistent for low-px jets but to show small discrepancies for large jet pt- For example, 
the absolute difference in efficiency of requiring BDT{bc\udsg) > 0.2 for b jets is less than 
1% up to a jet Pt of 50 GeV but reaches about 3% at a jet px of 100 GeV. In the data 
samples considered in this study, such effects are negligible as using BDT templates from 
different event types results in differences in the SV-tagged yields of less than 1%. 

Events where multiple b hadrons are produced could affect the SV BDT shapes. The 
fraction of SVs that contain a track with AR > 0.5 relative to the jet axis is studied 
in data with the back-to-back requirement for the event-tag and test jet removed. The 
fraction of SVs that contain such a track is found to vary by at most a few percent as 
a function of AR between the event-tag and test jet. This could indicate percent-level 
cross-talk between multiple b jets or could be due to changes in the jet composition. For 
the efficiency measurements presented in this paper the effect of (6, c)-hadron decays 
outside of the jet is negligibfe; however, such decays could have an important impact on 
the tagging performance in some event types, e.g. in four 6-jet events. 

Gluon splitting to bb or cc can produce jets that contain multiple (6, c) hadrons which 
have a higher tagging efficiency. The requirement that a (6, c)-hadron-decay signature is 
back-to-back with the test jet suppresses gluon-splitting contributions. The fraction of jets 
that contain multiple SVs in data is a few percent, which agrees to about 1% in all bins 
with simulated jets that contain only a single (6, c) hadron. The systematic uncertainty 
due to jets that contain multiple (6, c) hadrons from g —)• {bb, cc) is taken to be 1%. Finally, 
there is no evidence in simulation of dependence on the number of pp interactions in the 
event, so the uncertainty due to mismodeling of the number of pp interactions is taken to 
be negligible. The systematic uncertainties are summarized in Table 

4.6 Results 

A combined fit to the i?-|-jet, D-|-jet and /i(6, c)-|-jet data samples, including the systematic 
uncertainties in Table is performed to obtain the (6, c)-jet tagging efficiencies. In these 
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fits, both iV(b^c)(SV) and iV(b^c)(Xip) are determined simnltaneonsly under the constraint 
that the {b, c)-tagging efficiency in a given jet px and p region must be the same in each 
data sample. The highest-px track and muon-jet subsamples are htted independently 
since the scale factors between data and simulation could be different for semileptonic 
and inclusive decays. The scale factors for b and c jets are allowed to vary independently 
since these may be different for different jet types. The misidentification probability of 
light-parton jets is allowed to vary freely in each data sample, although the results obtained 
are all consistent and agree with simulation. 

The scale factors for the SV-tagger algorithm are measured versus jet px in the region 
2.2 <7] < 4.2, where the efficiencies are expected to be nearly uniform versus p, and in the 
region 2 < p < 2.2 for jet px > 20GeV, where the efficiencies are nearly uniform versus 
jet Px (there are not sufficient statistics to measure the efficiencies in the p > 4.2 region). 


The results versus jet px are shown in Fig. 14 and are summarized as follows: 


• The scale factors obtained from the highest-px track approach are all consistent with 
unity at the ±20% level. They show no trend in px for 6 or c jets. 

• The scale factors for muon jets are found to be consistent, albeit with large uncer¬ 
tainties, with those obtained using the highest-px track approach. The results are 
combined assuming that the scale factors are the same for semileptonic and inclusive 
(6, c)-hadron decays (see Fig. 14) and are summarized in Table The scale factors 
are consistent with unity for jet px > 20 GeV, but 10-20% below unity for low-px 
jets. 

• The scale-factor results obtained from the global fits are strongly anti-correlated 
between b and c jets. It is likely that the true scale factors are similar between b and 
c jets since many of the contributing factors, e.g. mismodeling of the SV position 
resolution, are expected to affect b and c jets in a similar manner. The highest-px 
track fits are repeated assuming that the scale factors are the same for b and c 
jets (see Fig. 14) and summarized in Table The results for jet px > 20 GeV are 
consistent with unity at about the 5% level, while at low jet px the scale factor is 
again less than unity by about 10%. The muon jet results are not combined for b 
and c jets since the 6-jet results are much more precise. 

Neither of the assumptions made in the combinations has to be completely valid; however, 
they should each be a good approximation. Overall, the efficiencies measured in data are 
consistent with those in simulation for jet px > 20 GeV with a conservative systematic 
uncertainty estimate of 10%. At low jet px the scale factors are about 0.9 for 6 jets and 
0.8 for c jets. Using the difference in central values obtained from the highest-px track, 
combined highest-px track and muon jet, and combined 6 and c jet results, produces a 
conservative systematic uncertainty estimate of 10%. The absolute efficiencies measured 
assuming the scale factors are the same for 6 and c jets are given in Table For jet 
Px > 20 GeV and 2.2 < p < 4.2, the mean SV-tagging efficiency is about 65% for 6 jets and 
25% for c jets. Finally, the TOPO algorithm efficiencies are measured in data and found 


to be consistent with simulation to about 5% for 6 jets and 20% for c jets (see Fig. 15). 
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Figure 14: Efficiencies of the SV-tagger algorithm measured in data relative to those obtained 
from simulation for 2.2 < r] < 4.2: (top left) results from the (closed markers) highest-px track 
and (open markers) muon-jet samples; (top right) the combined results assuming the scale factors 
are the same for semileptonic and inclusive (6, c)-hadron decays; and (bottom left) the combined 
results for (6, c)-jet using the highest-px-track approach assuming the scale factors are the same 
for b and c jets. The absolute efficiencies corresponding to the combined (6, c)-jet results (bottom 
right). 


The absolute efficiencies measured using the TOPO for b jets are: 21 ± 1% for 10-20 GeV; 
44 ± 4% for 20-30 GeV; 60 ± 5% for 30-50 GeV; and 66 ± 6% for 50-100 GeV. 


5 Light-parton jet misidentification 

Light-parton jets contain SVs due to any of the following: (1) misreconstruction of prompt 
particles as displaced tracks; (2) decays of long-lived strange particles; or (3) interactions 
with material. Type (1) can be studied in data using jets that contain an SV whose 
inverted direction of flight lies in the jet cone (referred to as a backward SV). Types (2) and 
(3) can be studied using SVs for which the ratio of the SV flight distance divided by the SV 
momentum is too large for the decay of a {b, c) hadron (referred to as a too-long-lived SV). 
The mistag probability for simulated light-parton jets using backward and too-long-lived 
SVs is consistent with the nominal mistag probability at the 20% level (the nominal 
mistag probability is shown in Fig. |^. Furthermore, the SV BDT distributions obtained 
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Table 2: SV-tagger algorithm (6, c)-tagging efficiencies measured in data compared to those 
obtained in simulation. The b and c results are obtained by combining the highest-px track 
and muon-jet results under the assumption that the scale factors are the same for semileptonic 
and inclusive (6, c)-hadron decays. The (6, c) results are obtained by htting the highest-px-track 
sample under the assumption that the scale factors are the same for b and c jets. The absolute 
efficiencies observed in data are provided using the “(6, c) jets” results. 


jet px (GeV) 

jet T] 

e(data)/e (simulation) 
b jets c jets (6, c) jets 

e(data) (%) 
b jets c jets 

10-20 

2.2-4.2 

0.89 ±0.04 

0.81 ±0.09 

0.91 ±0.04 

38 ±2 

14 ±1 

20-30 

2.2-4.2 

0.92 ±0.07 

0.97 ±0.09 

0.97 ±0.04 

61 ±3 

23 ± 1 

30-50 

2.2-4.2 

1.06 ±0.08 

1.04 ±0.09 

0.97 ±0.04 

65 ±3 

25 ± 1 

50-100 

2.2-4.2 

1.10 ±0.09 

0.81 ±0.15 

1.05 ±0.06 

70 ±4 

28 ±4 

20-100 

2-2.2 

1.00 ±0.07 

1.12 ±0.10 

1.05 ±0.03 

56 ±2 

20 ± 1 



100 


p^Cjet) [GeV] 


Figure 15: TOPO algorithm {b, c)-tagging efficiencies, using the “loose” BDT requirement, in 
data relative to those obtained in simulation. 


using backward and too-long-lived SVs are similar to the nominal light-parton-jet BDT 
distributions. Therefore, the mistag probability of light-parton jets and SV properties can 
be studied in data using backward and too-long-lived SV-tagged jets. 

Such a study is complicated by the fact that prompt tracks in {b, c) jets can also 
be misreconstrncted as displaced, and that (6, c) jets also prodnce strange particles and 
material interactions. Therefore, both backward and too-long-lived SVs are also found in 
(6, c) jets. The IT+jet data sample, which is dominantly composed of light-parton jets, is 
used to mitigate effects from mistagged {b, c) jets. Figure 16 shows the BDT distributions 
from backward and too-long-lived SVs observed in data compared to simnlation. The 
backward and too-long-lived BDT templates are similar for all jet types. The (6, c) yields 
here are hxed by htting the nominal SV-tagged data to obtain the total (6, c)-jet content 
then taking the backward and too-long-lived SV-tag probabilities for (6, c) jets from 
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Figure 16: SV-tagger algorithm BDT distributions for backward and too-long-lived SVs in the 
VF-t-jet data sample: (top left) distribution in data; (top right) two-dimensional template-fit 
result; and (bottom) projections of the fit result with the b, c, and light-parton contributions 
shown as stacked histograms. 


simulation. The distributions in data and simulation are consistent, which demonstrates 
that the SV properties are well-modeled for light-parton jets. 

The total light-parton-jet composition of this sample, without applying any SV-tagging 
algorithm, is found to be 95%, by htting the nominal SV-tagged BDT distributions and 
applying the data-driven {b, c)-tagging efficiencies from the previous section. The mistag 
probability of light-parton jets is obtained as the ratio of the number of SV-tags for those 
jets (obtained by htting the SV BDT distributions) to the total number of light-parton 
jets. The ratio of this probability in data to that in simulation is shown in Fig. data 
and simulation agree at about the ±30% level integrated over jet pt- A detailed study 
of IVfjet production in LHCb using the SV-tagger algorithm introduced in this paper, 
in which the jets are required to satisfy px > 20GeV and 2.2 < p < 4.2, hnds that the 
nominal light-parton-jet mistag probability is 0.3% which is consistent with simulation 26 


The same ratio for the TOPO algorithm is also shown in Fig. 17 


The performance of any tagging algorithm on light-parton jets can be affected by 
the presence of {b, c) jets in the event. The misidentihcation probability of light-parton 
jets is studied in simulated di-6-jet events and compared to the performance obtained 
in simulated events that contain no (6, c) jets. The absolute difference in the fraction 
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Figure 17: Ratio of light-parton-jet mistag probabilities observed in data to those in simulation 
for the (left) SV-tagger and (right) TOPO algorithms. 


of light-parton jets that are SV-tagged and have BDT{bc\udsg) > 0.2 is found to be at 
the per mille level for low-px jets, but increases to about 1% for jet px of 50 GeV and 
to about 2-3% at 100 GeV. The BDT shapes are distorted relative to those obtained in 
events that contain no (6, c) jets, but there is still significant discrimination between the 
light-parton and (6, c) distributions. The difference is largely due to particles originating 
from a 6-hadron decay and produced with AR < 0.5 relative to the light-parton-jet axis. 
These tracks may then form SVs with misreconstructed prompt tracks in the light-parton 
jets. 

6 Summary 

The LHCb collaboration has developed several algorithms that efficiently identify jets that 
arise from the hadronization of b and c quarks. The performance of these algorithms has 
been studied in data and is found to agree with that in simulation at about the 10% level 
for (6, c) jets, and at the 30% level for light-parton jets. The SV properties of all jet types 
are found to be well modeled by LHCb simulation. The efficiency for identifying a b{c) jet 
is about 65%(25%) with a probability of misidentifying a light-parton jet of 0.3% for jets 
with transverse momentum px > 20 GeV and pseudorapidity 2.2 < rj < 4.2. 
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